Counting Distinct Elements in a Data Stream

نویسندگان

  • Ziv Bar-Yossef
  • T. S. Jayram
  • Ravi Kumar
  • D. Sivakumar
  • Luca Trevisan
چکیده

We present three algorithms to count the number of distinct elements in a data stream to within a factor of 1 ± ǫ. Our algorithms improve upon known algorithms for this problem, and offer a spectrum of time/space tradeoffs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Evaluation of Streaming Algorithms for Distinct Counting Over a Sliding Window

Counting the number of distinct elements in a data stream (distinct counting) is a fundamental aggregation task in database query processing, query optimization, and network monitoring. On a stream of elements, it is commonly needed to compute an aggregate over only the most recent elements, leading to the problem of distinct counting over a “sliding window” of the stream. We present a detailed...

متن کامل

Sketching and streaming —

Distinct elements (F0). In this note we will consider the distinct elements problem, also known as the F0 problem, defined as follows. We are given a stream of integers i1, . . . , im ∈ [n] where [n] denotes the set {1, 2, . . . , n}. We would like to output the number of distinct elements seen in the stream. As with Morris’ approximate counting algorithm, our goal will be to minimize our space...

متن کامل

Range-Efficient Counting of Distinct Elements in a Massive Data

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider range-efficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer, but an interval of integers. We present a randomized alg...

متن کامل

Data Streams as Random Permutations: the Distinct Element Problem

We illustrate this by introducing RECORDINALITY, an algorithm which estimates the number of distinct elements in a stream by counting the number of k-records occurring in it. The algorithm has a score of interesting properties, such as providing a random sample of the set underlying the stream. To the best of our knowledge, a modified version of RECORDINALITY is the first cardinality estimation...

متن کامل

Range-Efficient Counting of Distinct Elements in a Massive Data Stream

Efficient one-pass estimation of F0, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider rangeefficient estimation of F0: estimation of the number of distinct elements in a data stream where each element of the stream is not just a single integer but an interval of integers. We present a randomized algor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002